AITopics | downstream system

Collaborating Authors

downstream system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use

Cesista, Franz Louis, Aguiar, Rui, Kim, Jason, Acilo, Paolo

arXiv.org Artificial IntelligenceMay-30-2024

Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). In this paper, we argue that BDIE is best modeled as a Tool Use problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks. The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multimodal Models (LMMs) without RASG on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE.

information extraction, retrieval augmented structured generation, structured generation, (11 more...)

arXiv.org Artificial Intelligence

2405.20245

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > San Jose (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Operationalising Representation in Natural Language Processing

Harding, Jacqueline

arXiv.org Artificial IntelligenceOct-7-2023

Despite its centrality in the philosophy of cognitive science, there has been little prior philosophical work engaging with the notion of representation in contemporary NLP practice. This paper attempts to fill that lacuna: drawing on ideas from cognitive science, I introduce a framework for evaluating the representational claims made about components of neural NLP models, proposing three criteria with which to evaluate whether a component of a model represents a property and operationalising these criteria using probing classifiers, a popular analysis technique in NLP (and deep learning more broadly). The project of operationalising a philosophically-informed notion of representation should be of interest to both philosophers of science and NLP practitioners. It affords philosophers a novel testing-ground for claims about the nature of representation, and helps NLPers organise the large literature on probing experiments, suggesting novel avenues for empirical research.

information, intervention, representation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1086/728685

2306.08193

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(13 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Lian, Yijiang, Chen, Zhijie, Pei, Xin, Li, Shuang, Wang, Yifei, Qiu, Yuefeng, Zhang, Zhiheng, Tao, Zhipeng, Yuan, Liang, Guan, Hanju, Zhang, Kefeng, Li, Zhigang, Liu, Xiaochun

arXiv.org Machine LearningAug-5-2020

Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.

downstream system, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2008.02014

Country: North America > United States > District of Columbia > Washington (0.05)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Change Data Capture (CDC) and Kafka

@machinelearnbotDec-4-2017, 19:50:19 GMT

Change Data Capture (CDC) is an approach to data integration that is based on the identification, capture, and delivery of the changes made to data sources, typically relational databases. Change operations can be the INSERT of a new record, an UPDATE or DELETE of an existing record. With Apache Kafka and in particular with the Kafka Connect APIs and the Kafka Connect source connectors available it's very easy to create data pipeline which will capture and deliver changes from an existing RDBMS to a Kafka cluster. From there you can send those changes to downstream systems, typically NoSQL storage systems (such as Cassandra, MongoDB, Couchbase, etc.) or search engines (such as Elasticsearch). It is also possible and advisible to keep changes stored or cached in a Kafka compacted topic, this way if you want to perform parallel joins via Kafka Streams or KSQL, the joins will be done easily and efficiently in parallel with no repartitioning necessary.

artificial intelligence, change data capture, data mining, (15 more...)

@machinelearnbot

Industry: Health & Medicine > Public Health (0.67)

Technology:

Information Technology > Information Management (0.92)
Information Technology > Artificial Intelligence (0.57)
Information Technology > Communications > Social Media (0.40)
Information Technology > Data Science > Data Mining > Big Data (0.32)

Add feedback